PHASE 3 VALIDATION GUIDE
========================

Purpose
-------
This file defines the repeatable validation process for the Phase 3 Seven Metrics Engine.

Core Reproducibility Contract
----------------------------
1) Source inputs are normalized Quran files matching *_normalized.xlsx.
2) Required input headers are sura, verse, and text.
3) The engine computes M1-M7 mechanically from the normalized text.
4) Repeated clean rebuilds must produce identical canonical outputs.
5) Registry and hash artifacts are generated after pipeline execution.

Validation Steps
----------------
1) Run the full pipeline:

   python run_pipeline.py --clean

2) Run reproducibility verification:

   PowerShell -ExecutionPolicy Bypass -File .\verify_repro.ps1

Expected Results
----------------
[OK] Reproducibility verification passed.
[OK] Fresh rebuild succeeded, naming contract holds, and canonical Phase 3 outputs are deterministic.

Output Artifacts
----------------
organized_outputs\json\
organized_outputs\txt\
organized_outputs\csv\
organized_outputs\xlsx\
organized_outputs\audit\

Pass Criteria
-------------
A validation run is PASS only if:

1) All discovered *_normalized.xlsx source files are processed without error.
2) Each source produces JSON, TXT, CSV, XLSX, and summary TXT outputs.
3) JSON rows include M1-M7 fields.
4) CSV row counts match JSON row counts.
5) TXT row counts match JSON row counts.
6) XLSX files contain Metrics and Summary sheets.
7) M1, M2, and M3 values are internally consistent with the source text.
8) Repeated clean rebuilds produce identical canonical JSON, CSV, and TXT outputs.
9) workspace_registry_hash_audit.py reports PASS.

Canonical Artifact Policy
-------------------------
Canonical reproducibility is established using JSON, CSV, and TXT artifacts.
XLSX files are generated and included in registry/hash tracking for integrity visibility,
but are not treated as canonical deterministic artifacts.

Current Status
--------------
Not yet frozen until verify_repro.ps1 passes in the local workspace.
